On Consistent Checkpointing in Distributed Systems
نویسندگان
چکیده
Consistent checkpointing simpliies failure recovery and eliminates the domino eeect in case of failure by preserving a consistent global checkpoint on the stable storage. However, the approach suuers from high overhead associated with the checkpointing process. Two approaches are used to reduce the overhead: one is to minimize the number of synchronization messages and the number of checkpoints; the other is to make the checkpointing process non-blocking. These two approaches were orthogonal in previous years until the Prakash-Singhal algorithm 17] combined them. In other words, the Prakash-Singhal algorithm forces only a minimum number of processes to take checkpoints, and it does not block the underlying computation. In this paper, we identify two problems in their algorithm 17] and prove that there does not exist a non-blocking algorithm that forces only a minimum number of processes to take their checkpoints. Based on the proof, we present an eecient algorithm that neither forces all processes to take checkpoints, nor blocks the underlying computation during checkpointing. Correctness proofs are also provided.
منابع مشابه
Necessary and sufficient conditions for transaction-consistent global checkpoints in a distributed database system
Checkpointing and rollback recovery are well-known techniques for handling failures in distributed systems. The issues related to the design and implementation of efficient checkpointing and recovery techniques for distributed systems have been thoroughly understood. For example, the necessary and sufficient conditions for a set of checkpoints to be part of a consistent global checkpoint has be...
متن کاملTransaction-Consistent Global Checkpoints in a Distributed Database System
Checkpointing and rollback recovery are well-known techniques for handling failures in distributed database systems. In this paper, we establish the necessary and sufficient conditions for the checkpoints on a set of data items to be part of a transaction-consistent global checkpoint of the distributed database. This can throw light on designing efficient, non-intrusive checkpointing techniques...
متن کاملReview of Some Checkpointing Schemes for Distributed and Mobile Computing Environments
Mr Raman Kumar Mewar University, Chittorgargh (Raj) Email: [email protected] Dr Parveen Kumar Amity University Gurgaon (Haryana) Email: [email protected] ---------------------------------------------------------------------ABSTRACT------------------------------------------------------Fault Tolerance Techniques facilitate systems to carry out tasks in the incidence of faults. A checkpoint is a...
متن کاملAn Index-Based Checkpointing Algorithm for Autonomous Distributed Systems
This paper presents an index based checkpointing algorithm for distributed systems with the aim of reducing the total number of checkpoints while ensuring that each checkpoint belongs to at least one consistent global checkpoint or recovery line The algorithm is based on an equivalence relation de ned between pairs of successive checkpoints of a process which allows in some cases to advance the...
متن کاملThe Performance of Consistent Checkpointing in Distributed Shared Memory Systems
This paper presents the design and implementation of a consistent checkpointing scheme for Distributed Shared Memory (dsm) systems. Our approach relies on the integration of checkpoints within synchronization barriers already existing in applications; this avoids the need to introduce an additional synchronization mechanism. The main advantage of our checkpoint-ing mechanism is that performance...
متن کاملAn optimistic checkpointing and message logging approach for consistent global checkpoint collection in distributed systems
Checkpointing and rollback recovery are widely used techniques for achieving fault-tolerance in distributed systems. In this paper, we present a novel checkpointing algorithm which has the following desirable features: A process can independently initiate consistent global checkpointing by saving its current state, called a tentative checkpoint. Other processes come to know about a consistent g...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997